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TITLE: NETWORK BASED CLASSIFIED INFORMATION SYSTEMS 
FIELD OF INVENTION 

post*) on a netwoft. ana. to Mb p a|)es (or use h ^ aM ln-1 ^ ro " pases 

J! 9 Such ,ntofmatl «« systems and databases typicaly include sete If 

cTne^? tatht 1"°" COmfn0n aCCCSS t0 ™«»P documents on computers 

20 SSSSs? ^f^^^ -"=ss«M 

. and the markup documents themselves ore commonly called "web pages" A web D^eS 

25 C ^ S ^"^^ aCWSsible t0 A web page is transported 

I^EJSSSIL* f TS? ° 0rnpUler <hrOU9h W -""*-»' compos 
internet based cnformation systems, these terms are used for convenience. 

BACKGROUND TO THE IMVEWTHOW 

3& ImL^cSSr/ 31 ^ ~ 100 W8b ««■ on the Internet end that the 
Zm^ oS^ ° f *— Pa » e8 «■*"»■*" concerning 

tocatmg such mformabon .s mereaang fa****, ,,,3 growth h ^ nufnbe , Qf web pgge5 ^ 

SZZTJL T£ ] I* ° f W8b PSSes P 03ted (stored on computer readable 

Se^ Lt^? T*"* °" *■ and ^ ^arcn engines' to use 

— ^ C ' 8ated aU,0mato,, y by the use of Veb oLers' *,hich 
40 ITtSTolZ Zl 00 ne * 0,h ,0 ^ successive web pages and (ii) 

F^cTac^ orS^Z .T 98 •"" untoBd a 9 3 ^ the network address (eg Interne 
me^b oaoaTplS,!^ J*** 1 ° r universal resource locator (URL) at which 
*e Sen ?o teioSSrn ^ URL and URI <M*mi Resource Identifier) 

ui^ ^ ^e^L rn ? n 'S to *"* ^ ***■■• and filing system paths 

browsers <■"■»» swpfclw^ 
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The search for web pages of interest using search engines leaves much to be desired- 

♦ simple searches (those using a few keywords in simple combinations) often yield far too 
many web page references (URLs) to permit them to be interrogated one-by-one 

5 • complex searches (those using many keywords and/or complex Boolean expressions) 
require considerable expertise to undertake. 

• even using optimum search criteria, many inrelevant web pages are referenced because of 
inconsistent use of terminology by those who author the original web pages 

. even using optimum search criteria, many relevant pages are missed, again because of 
10 inconsistent use of terminology by web page authors, and 

- because items of information included in the body of web pages cannot be -understood' or 
associated m useful ways by web crawlers; that is recognised as. say. a surname, a street 
name, a geographic locality, or type of goods or services and. say. a surname strongly 
associated with a street name, a geographic locality, or a type of goods or service. 
15 The result is that information provided by search engines from databases which are 

automatically compiled using web crawlers is a very poor equivalent of the common Yellow 

Pages and White Pages directories which serve the telephone industry (though these 

directones are not, of course, automatically compiled from web pages). 

20 In an attempt to improve the usefulness of automatically compiled network databases, some 
search engine providers make use of information contained in URLs, such as the country coda 
and top level domain name codes such as 'com-, 'edu'. nef and "org- which is sometimes used 
to signify the subject matter of web pages. It has been proposed to add more content 
dassrfying codes to URLs (eg. "chenf to signify chemical subject matter) to allow specialised 

X> databases - national, commercial, chemical, etc - to be generated. However, this proposal 
has serious drawbacks 

• URLs are Internet addresses and it is in principle undesirable to confuse the address 
function of a URL with that of representing a list of web page classifications or contact 
details. 

30 . A URL is an inappropriate container of multiple web page classification codes and contact 
details because the length of the URL would cause it to become unwieldy as an Internet 
address. 

• Including in a URL classification codes drawn from a list of thousands of codes would 
compromise the mnemonic quality of Internet addresses such as -www.yellowpages.com \ 

3b . There b substantial overtop in the subject matter contained in web pages having the 
various top level domain name codes. 

• There is no consensus on. or standard for. content classification codes in URLs. 

An Pf0pOSa ' to add wnUint d «siffcation data to web pages has arisen from the wish to 

40 identify pages containing material that may be offensive to some viewers, or should not be 
accessed by minors. The Platform for Internet Content Selection (PICS) (see 
htto^Avww.w3.org/pubAWW/Pics and other documents at www.w3.org) is a web page 
ratings standard sunilar in principle to the ratings systems for motion pictures. This system 
a« 2Sf "? e auth0fS to " into,na| V self classify (heir pages through use of the vmeta.. >' 
45 HTML element. Aflemativety. 'extemaT PICS ratings of web pages may be obtained from 
ratongs servce providers accessed each time a URL is selected. In practice, the ratings service 
probers have adopted very limited range of web page classifications. For example. Ararat 
Softwares Commercial Rating System (see http^/www.araratcom.ratinfls/araratlOhtmO 
provides just 5 categories of web page content commercial content. technicaVcustomer 
50 support, ordenng information, downloading information and contact information. In other 
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categories. SafeSurf m>J/Z^J7^, ^'^-^c.om/faq.html) provides 4 

5 categories. None .of tta ^daT^T? ^ ^ an ^er^ebpages.coriWVVPl.o/ provides 11 
prodL or sublet j£ SSTsSvTf 3 ' 0 " ° f W6b P398S by 
Rather, the caSriTa^Se?^ V^Tf"' searching for web pages. 

be used for the automated creation of Y^or^ jf ^ ^ emS ^ " 0t inlende<1 to 
10 and are unsuitable for that teSS tteTcfn 2£ T"" - W6b pa ° es 

the ratings data may only hmM^ tnt > L ^T"' ""^ dataas Fu «**. 

classified and^e das^X^tT^T C ° n,enl °' ^ to te 
type of lext/mcf. SwZSK £££ if" 1 * n<>fKH ™ L data f,te * MIME 

HTML encoded ^^l^l^T T^^^'^^^ 

20 elation of Yeflow or wC.t oSl* TS UnSUfted to ^ autom8ted 

tnmmo because d^tSSXj^ PageS « 

web pages. ^ W Uie MCF D «>posal is not stored in HTML encoded 

*«-sid™e- V card^^ 

the non-standard "te*t/x JrJ^ —T- • (M,ME Contem 7 VP W of teat/plain' or 

White lC^2S^?jSS ^ infofmafon ' equiv8te^, to a " 

(SMTP) cJKiSfl £ fS? 0 " 3 netW0 * U6ln9 Sim P te M^fl Transfer Protocol 

in the web paae 

2 ^JjfSSi ft^ h ^«P^.ming.cofa' tt Card.vc^S y 
provides for meTc^si^^Slt ^ fcmBt 18 193S > 

recommends ^at S ol^^^l ^ intoftna ^ The vCard specification 

<o ^lisM^^Hm^S p,^'.'** < " la ""^ acco """S «> »» VCatd 

coo^ationofmoaSS^.^ " ««■■■* duplication of data and 
be done to „«o» T^^^~IT^? mmM •»<* «»«»» pages. Tnis tntt 

portions of web D aaes to h» m^osJi "\ associated file or vice versa. Also, to allow 
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restncbon of address data in a vCard to untagged ordlnally organised fields is inflexible For 
example, multiple instances of extended parts of the address are not possible Also 
SfeT* nameS ' Bddre6Ses and tete P hone numbers and *° forth are ^sufficiently 

5 

The OnOne Computer Library Center Inc (OCLC. Dublin. Ohio. USA) proposal, known as the 
UMn Core proposes to classifying scholarly web pages by subject (topic of the won, or 
keywords that descnbe the content of the work). We, author, publisher, other agent date 
object type (genre of the object such as home page, novel, poem etc), form, identifier, source 
Si 9 !" and (spatel and temporal) w 

htt P 7A^.odc.org:5046/^e«htniki,et a .htrnl and other documents at mmodc.oro) Thb 
proposal does not include industry, service, product or subject classifications. It also dees not 
include contact details. Names such as that of the author are not specified in sufficient detail to 

1* r°.1 an f? , f 8S 38 " hiGh k fm and test names The Proposal specifies 

15 that the details are encoded using the <meta...> element in the <head> of web pages The 
proposal is unsuited to the automated creation of Yellow or White pages Bee databases' from 
web pages because the proposal does not provide for classification of web pages and does 
not provide adequate contact details. Further, the use of keywords for describing the content 
of the work adds very im to the effectiveness of indexing of web pages since the web pages 
20 am usuaBy indexed on every word of their content and most often the key words would simply 
be a duplication of words already contained in the document 

* !* s baen Proposed to use the Dewey Decimal System (see 

h^/orc.rsch.odc.orB:6109/eval_dc.html and http^/orcr S ch.oclc.org:6l09/bintro.htmO to rank 
25 electron documents against a Dewey Decimal subject classification. The proposal suggests 
automabcany assigning Dewey Decimal subject classification codes to documents during 
automated metering and cataloguing but does not specify the exact nature of the assignment 
alttough it is implied that the codes are stored separately from the documents. The proposal 
admits that such automated classification is less satisfactory than human classification. The 
P m P° sal » united to the automated creation of Yellow or White pages Rite databases from 
web pages because the accuracy of classification ts inadequate, does not provide for inclusion 
« serwee or product classifications and does not provide for inclusion of contact 

details. Deriving a subject classification code from an analysis of every word and phrase in a 
web page is computationally expensive. 

JheHTML 3.0 standard (see page 23 of the www.w3.org document •drafNetMrtml*pecv3- 
OO.tat" ) provrdes -class- as an attribute of almost all HTML ■ a <body> a elements The 'class' 
attribute te intended to be used with style sheets. Style sheets provide a means by which the 
display of HTML documents may be altered to sua the needs of different classes of browser 
40 users. For example. <dw dass=»8ppendbf> could be used to define a division that acts as an 
append*. <h2 class-"secfion"> could be used to define a level 2 header that acts as a section 
■22?" ■ 3ny Strin0 * Charadare *» defi ned *>r those purposes. The 

«S?«^r' r 8Wr ^ taen SU9fl8Sted for hoWir « Soods and services 

45 !5^!T 5 ' *, T Jf*. W SUCh 8 U$e 08 * * in any desirable to confuse the style 
45 sheet function of the "class attribute. 

The HTML 3.0 and saner standards provided the HTML elements ^personV and Vaddress>- 

pLI" 0 ! 808 ^.*' 6 ^ * ***** or <* validating the content of those 

50 ST" 3 l,a '^f rnay be wrftten as first name followed by last name or last name 

50 followed by first name. Similarty. different conventions exist for writing addresses Similar 



ambiguities arise in the ill defined format or the HTML elements «< pe rson>- and Vaddress*- 
As such mey are of Mtle use « the automatic compUation of searchable databases. 

"The XML language (see: httpr/Aextuality.com/sgm^rWVVD-xrnl.htrnO was developed to extend 
s0 that software vendors can add new elements and new element attnhtrtoe tn wtiuii 
whj* are not specifically defined in any HTML standard. The intention is to ensureftat a" new 
etemerrts and attnbutes could be parsed by aD XML parsers even if the newTte^S he,d no 
agmtance for any parbcular XML parser, However. fike HTML. XML doeTnottro^e a 

^5 h! /a Z^L1 0 scan mML web pages posted «• a -two*. 

Sfri e ^ , ^ eS0 ° ,n 3U and ^^'"cp.com provide dassffied advertisements of 
£ 2?12E?7 totheWebpa9eS <* " a V"9 advertisers or subacnbers" ere 

Z addressea which approximate the White Pages directories, listing 

l£Z3L? ^SSL*" 1 0,gan,sa,ions and (eg http^/www.bigboolccom 

20 2 SE^"*^^ meSe emaa directories «•**» «« 2 manuany 

6nqU,rere t0 to 3Ware 0f and ,0 find *• d^ory enquiry web 

SStaiaTS '^5^ 9en8rated by 8C8nninfl Web ^ ^g wabcLers 
smce there is no adequate mechansm to relate email addresses to the names of people and 
organuatons and the* other contact details which may also exist in the same web pa^e 

25 OBJECTIVES OF THE INVENTION 

^Th^S** of hvention fe to Provide improved methods for automatically building 

. . *? dassification - ****** and/or geographical information by using web 

°™5T° m S P39e8 P08ted on a net * 0 *- IF* convenience, this information is 

eoDectrvely referred to as CCG-dataJ. 

2Sl^ he 2f n,,a, °'* eft * 8 arB 10 P"^ 6 methods ,or and/or delaying CCG- 

□ a ?J^ t !!f Pa9eSaC ^ SSed by browsere - for automatically extracting CCOdata from web 

TJZ^Z^ZS?!"* f0f USin9 Same " and/or to P^e methods for searching 
automatically compiled databases using euch data. 

SSf*!? 0l ? ° f "* Snventi0n ,0 pf0vil,e 8 new f0fm of ■* P a S e *»** * 
better surted to the automatic compilation (using web crawlers) of databases constructed by 

the automatic scanning of many such pages posted on a network. 

40 OUTLINE OF THE INVENTION 

The invention is based upon the realisation that highly useful databases can be automatically 

e^ed cSnh^"^^ ***** on a » °™ or more HTML 

Z£H £ P ^ w ^ dUded P a S«- A CCG phrase is one containing CCG- 

45 o Ire llZ^^l^^ ** CCG P»««* ™y afeo indude one 

£7» 7?££JT* ** pafle auth0f wfth ^ ovef how 1,16 CCG ^ ata " 

^S^Z^l * ' BdU ? d tfSOmeofthe CCG ^ at » the coded CCG phrases can be 
50 ^- We " as ^ ^ to update databases. Errors due to exactly 

50 duphcated data are also etanated. Accordingly, it is envisaged that CCG phrases may indude 



30 
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one or more items which provide the web page author with control over how the CCG-data ia 
displayed by a browser. 

HTML (including version 2 and version 3) and XML are evolving applications {sub-sets or 
5 dialects) of ISO Standard 8879 1986 known as Standard Generalised Markup Language 
(SGML). HTML, in large part, h a language used to describe how text (unstructured data) and 
graphics is to be formatted for display. The HTML language consists of a finite number of 
"elements' (for example: '<BR>' where "BR" is the element name, also called the tag name) 
which may contain -attributes' (for example; *<DL COMPACTS where "COMPACT* is an 

10 attribute named "COMPACT) and may contain values associated with attributes (for example- 
•<FONT StZfc>M>- where +1 is the attribute value of the attribute named "SIZE") XML is a 
language used to describe structured data. The XML language is similarly composed ef 
elements, attributes and values with a similar syntax to HTML but unlike HTML the element 
names which may be used are not restricted and the meaning of the XML data may be 

15 interpreted m any convenient manner. While the XML language is mute about how data 
described by XML is to be formatted for display, the data may be used by computer programs 
for any purpose including description of how XML coded data is displayed. However, due to its 
historic importance in connection with web pages, the term "HTML" is herein used to refer to an 

■ markup languages which are subsets or complete sets of the SGML language. In particular 

20 the term "HTML encoded CCG phrase" and the synonymous term "CCG phrase' are herein 
used to refer to CCG-data encoded in a subset or complete set of the SGML language. 
Herein, a Veb page* is a document adapted to be or actually accessible through a network 
and encoded in a subset or complete set of the SGML language. 

25 For convenience. CCG items in HTML encoded CCG phrases, whether they are syntactically 
represented as elements or as attributes, wffl be referred to hereinafter as CCG attributes. 

A CCG phrase includes at least one of the following identifiable types of CCG-data attributes: 

• industry, product, service, and/or sur^ct classifications. 

30 . contact categories, contact person(s) and/or organisation^) names, tittes or 
associations, contact detab IncWing physical and postal addresses, telephone and 
fax numbers, email and Internet or network addresses or locations, public keys, and 

• geographic location details. 

35 A CCG phrase may also include any of the following identifiable types of CCG control 
attributes: 

• database control attributes to indicate which parts of the data are to be used to 
update databases, and 

• display control attributes to indicate how browsers are to display the data. 

By virtue of occurring in the same CCG phrase, a plurafity of CCG-data attributes are 
associated with each other. 



40 



By wrtue of their occurrence in the same CCG phrase. CCG-data attributes are idententified as 
45 a set of associated attributes. However the degree of association between attributes can be 
controlled by the inclusion in the phrase of database control attributes. 

The start and end of CCG phrases should be identifiable to dearly distinguish these phrases 
from other data. To identify the beginning and end of a CCG phrase, at least one HTML 
50 element should have a CCG specific HTML element name or CCG specific attribute name or 
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ISLlT^ Ea ? CCG atWbute with or without other incidental 

characters, o a CCG attribute name and/or a CCG value or values. Preferably eaTcOG 
phrase .s contained in the '<body>- of the web page. 

5 Two examples of a CCG specific HTML element are: -<CCG >' or -<CCG />• or 

e»mp te is '^CG "^Sl? ^ ^ ^ end of the CCG phrase.) A teas savory 

53TuL tilS, ? araCtefS " CCG " aftef H ™ L «™n«nt element name 
m ' Jr - * ^ «he comment contains CCG-data. An example of the use of a CCG 
1 0 specific sttnbute name is: *<START CCG>'...-<END CCG>'. An example of *e u2 o a CCG 

speafic vafce is: "<START TYPE=CCG>'...-<END TYPE-CCgV SSSL? 2S 

element attribute value 'CCG" string of the examples. 

15 £L^* C wV^ and ,<CCG - are ""l"** ^ ™ s < HTML specifications but 

2"c5£^T n ! ^JT* 6 * '<* 8nd ^ coda, are preferred where dMayTf 
the CCG data « not requred and compatibility with older browsers is required (eg CCG 
phrases containing only classification vaiues) ^ureo (eg ucci 

20 

From one aspect, therefore, the invention comprises a web page for posting on a network the 
Z ET ^JS**** b * *• h**n of at .east one CCG phrase in the^' of 
*ZJ3L h , .-/ h . raS ! ^ *** 0,31 CCG atbibutes itemed therein, are 
25 SSJtS J "JT ^J? "™ L ^ «»« and ' or <» «™»- comprint web 
ZS, k Consbuction of dattbaie. of classified information, and/or (in) HTML 

«™pl*nt browsers for disp^ VJ 

From another aspect the invention comprises a method of constructing web pages of the 
30 m!^**^ tyP . 6 - W6b Pa9eS may * cons( ^ 0" di9teJ computers using simple 
^ CroS ° ft Wndows Note P ad - ° r ^ferabV. Purpose built human controlled 
editors or automated composing programs which embody knowledge of HTML and CCG 

SJTh ff T 3 !; VW * h eV6f Pr0OeSS fe CCG attnbutes are selected and inserted 
i 8 0,BaniSed t0formaVaSd CCG P^ses in HTML encoded documents 
35 to a^ m nr ,en t 8 '! POS lf °° COmpUterreadable *»"«• devices of computers connected 
£«woT S ° *■ d0CUmCntS afe 9enera0y avaflabte 10 «"»"•«» "» * e 

S .S!^!?"* inVentk,n COmpriSGS a method °' PopuWino a database with CCG- 
40 a l^T, ^ m PafleS - POCted on a n °»"°* are successively retrieved by 

iSSL^SXS 0 ^^ WebCra * ter) and CCG Phrases contained therein are 

* " f"* Some oftheCCG a **utes *™d *Hhin the CCG phrases are extracted. 

Gene^S Trr^™? detem *» »» type of data in the associated values. 

o^ra^ic dL ^ t ?K Ute8 ■* ^ 0,056 reta6n9 to contact and. 

45 nSon to d^t ?♦ 36 ^ COn,R,,S ^ *» attributes °< ° r •» «* *> 

SL S S n ^ ^ to ^ Of course, the CCG^ata 

a^da^ ^ b89 ^ ned to ^« ^ web page classificaVons and URLs wEile 
h ^2? 1 f y ^ bee " des * ned to ^ c °n^^ detans. Databases also differ 
« the, Sterna, representation of data and means of associating data. For exampte some ui 
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flat file* tables, others use pointers to data to create network associations while others use 
hashing and buckets. 

The conventional nomenclature differs considerably between different types of database. 
5 Depending on the particular database nomenclature, data of the same type is said to be stored 
in table columns, fields, attributes and properties. The terms column and field are somewhat 
related to the physical representation of the data in files while attribute and property is more 
related to the logical representation of data. To avokJ confusion, with the terms "HTML 
attribute". °CCG attribute 0 or just 'attrfcute", hereinafter e> database property means both a type 
10 of data stored in the database and a pfetce in the database where data of the same type is 
stored. Database properties are referred to by a name Cproperty name*) or similar reference 
and contain values. For example, a database property with the name "City name* and which 
contains values which are sB the names of cities may be defined as a 'City n3me* type 
database property. 

1 5 

Whichever style of database is used, £t ts preferred that the database update program relate 
the CCG attributes to corresponding database properties used by the database update 
process so that the database property values are updated with CCG values in a manner which 
preserves the distinctness, content and meaning of the CCG values and. preferably, preserves 
20 the CCG value associations expressed in the CCG phrase as sets of associated database 
property values of different types. 

In some cases, it is desired to know the address of the web page from which the CCG values 
were extracted. For example, the purpose of building a database might be to allow searching 

25 of the database by web page dassfficarSon to provide a fist URLs of web pages or URLs of 
portions of web pages which contain matching CCG classifications. The URLs could then be 
inserted in an HTML document and transmitted to a web browser as a fist of references to web 
pages matching a search expression. In that exampte. associating the URL of a web page or 
the URL of a portion of a web page with the CCG values extracted from the same web page or 

30 web page portion "o important and th® URL or means of reconstructing it must be available and 
supplied to the database update process. In one style of database, the values of the same 
type are held separate rows in e ceSumn (property) of a database table, and pointers held in 
another column (property) are associated with the values by sharing the same table row. The 
table row constitutes a set of associated property values. Each pointer points to a bucket 

35 (blocfc of data) containing a list of URLs or pointers to URLs held in a separate bucket or table. 
In another style of database, values of different types ere held in different tables together with 
a set number, pointer or simSar code ttftceh © used to ind'tcate.whcch values are associated as 
members of the same set In one variation, the values of set members are prefixed with a code 
indicating the type of value and aS values are held in the same column of a table. If the 

40 purpose of the database ts to hold contact data, recording the web page URL in the database 
might not be required although If the URL is not present m the database, updating changes in 
the CCG contact details contained within e web page is more difficult Of course, one 
database may be used to record afl types of CCG values contained in web pages and 
associate with each other any and aB values extracted from the same web page or even from 

45 other web pages. 

From another aspect, the invention comprises a method of searching the databases 
constructed as outlined above. These databases may be used for a variety of searching 
purposes. For example, to find web page URLs by using the association of web page URLs 
50 with industry, service, product or subjed classification or a person's or organisation's name or 
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"dutfig steps of paising a que r, phras, received .^T? 21?" 
10 relational eipraraioos ,„d ton, ^7.11!^?- V? a "••"«* •» «»cl query 

K ,u K^iy vaiues associated with the so locate rrr^-,^ . 
values are extracted. CCG-data database property 

r^oTa S^r^T? o»»» ousu™, .to be implied, such « s e list of URLs of nb 

associated £S fates i£f 2 *»* ««»y » fiod ft. sets of 

selected but oointore »„ >~. V^L®T. re efr,aenl - rt ,s not the values which are 

SSto U^U SLllJiS £ T f° ^ 3nd t * B ™> m un ^ e N» (es URLs or 
40 S to Z^tTn^Z^J^ ^ AND ^ is often 

found fo^^h^lT? n , ° nly Va,U8S or P 0 ^™ ° r toys common to both bets are 

attractively formatted HTM Tn'rLS ! f web B Processed to produce an 

bw^to^a^^iT^lt 01 ^ °° nlaininfl URLs and is sent to a web 

the databaBe and ore^nted i »!!L!f ° r ^ in »he result list are retrieved from 

u pnrases wrthm web pages whxh are delayed by a web browser executing on a digital ' 
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computer. While a web page is loading or has loaded in a web browser, the web browser 
parses the web page and displays (he text (or data) of the web page on a display device 
connected to the computer. When the web browser parser encounters CCG phrases the web 
browser may display the CCG-data (element and/or attribute names (or transitions of element 
5 and/or attribute names) and/or values) in a number of browser specific ways. For example the 
web browser may by default not display any CCG-data. drsplay all CCG-data. not display any 
CCG-data until a CCG display control attribute expfictty states that subsequent data should be 
displayed or djsplay aO CCG-data until a CCG display control attribute explicitly states that 
subsequent data should not be displayed. The web browser may also use CGA display 
1 0 controls specifying the sire. font, position and so forth to alter the display of the CCG-data. 

DESCRIPTION OF EXAMPLES 

Having indicated the nature of the present invention, examples or embodiments thereof will 
now be described by way of illustration only 

15 

Example 1: HTML Syn tax Suitable for Representing a CCG Phrase 

The following is an example of HTML element syntax suitable for representing CCG phrases in 
which a control (e.g. "SHOW") may be 'good until countermanded" and thus apply to more 
than one field: 
20 <CCG HREF="un~ 

{{NAME="laber | ID="identifier_code-} &| {LANG=*language coda" & 
CLASS=*Class name - ) 
{ . 

{SET_SEPARATOR}&| 
25 {INDEX |NOINDEX}&| 

{SHOW | HIDE} &| 

{XPOS="horizorrtal _positk>n_number") &| 
(YPOS="vertjcal _pos«jon number} &| 
{NEWLINEJ&I 
30 {AUGN=centre | left | right | justify} &| 

(SIZE=[+/-]1|2|3|4|5|6|7}&| 
{COLOR=^frrggbb" | ■colour.name'} &| 
{FACE=Type_face_nameT &| 

{BUNK &| BOLD 4| UNDERLINE &| ITALIC &| STRIKE} &| 
35 {SUBSCRIPT | SUPERSCRIPT) &| 

{CLEAR{=left| right |ai}} 
{NORMAL} &| 

{{{CONTACT &| COPYRIGHT &| DEVELOPER} &| 
{PERSONAL &| BUSINESS &| ASSOCIATION} &| 
40 {attribute_name=*a«ribute_value(s)"} 



> 



where: the ellipse impGes optional repetition of the braced Cf T) items; the braces are 
i 1 **! 0 9^"P te ™ and are not CCG syntactic elements; "&* (and) implies items must occur 
together T (or) implies only one item must occur, and *&r .(and/or) implies any including none 
of (he items may appear together. 

Using the syntax of this example, each CCG phrase is represented as an HTML element, the 
50 element name being 'CCG' and the CCGxIata (eg attribute_name=-attribute_value-) and CCG 
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S?? 25 stares ss 

be represented as *<nofmat/>. " an0 NORMAL can 

W S SeF-u^T' ^ ^ ' nd CLASS teke mea "^s from HTML 3.0 The 
15 URL C^l3Zl^ d ^ 0r ^ Mm "KhoMabels. Forexan,p^: 
^ite ^TIl f^l! d06S not ""^ 3 de ^Bort anchor label (or identifier) 

Database control attributes: 

25 ^S!K££!^ 7* ° f aSSO6ati0n between ^ faiwrino data other 

^ Weater mu,Ua, assodafo n «*h «he same CCG phrase or web pace" the data 

.M: : ^^2r.ssssir an ^ p,ied atttbute va,ue ° f ™ * 

30 Display control attributes: 

JSifS *" " PhySfca ' Unte) on *• bro ^ r ^ where 

S ^ «2 hn ° f 38 30 3,ternatiVe method ° f 

40 where data win hZm~^ _ u .71' . indicates lhat Ihe browser screen in the region 

45 CCG^data attributes: 

S^S^STTp ^ V* OWner and/or to «he HTML or web page 

^a~an'7r„^ ■ 

nuo u preseni in a CCG phrase or set and False when 
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absent from a CCG phrase or set The attribute_name could be standard CCG attribute names 
or synonyms of standard CCG attribute names or abbreviations of CCG attribute names which 
refer to the following types of CCG attribute values where square brackets T and T surround 
suggested attribute names: 
5 o industry or service or product or subject classifications and sub-classifications: 

♦ classification name (CN] t 

• classification codes (CCJ. 

• display only text (TEXT]. 

• contact 
.10 • person: 

courtesy title [PNC], 
first given name PNG], 
other given names [PNO], 
family name [PNF], 
15 • name suffix [PNS], 

qualifications [PQJ. 
associations {PAJ. 
contact person title [P-T]. 
contact person role (PR]. 
20 • organisation: 

name (ON], 
unrtfOU], 
identifier [OID]. 
physical or post or delivery address: 

type [AT] (= -PHYSICAL' «| TOST-OFFICP &| POSTAL- &| "DELIVERY*) 
post office box number [AP#) 
post office name JAPN] 

room or suite or office or unit or flat or apartment name &| number [AB#l, 
floor name &| number [ABF], 
buftding name [ABN], 

lane or street or road or highway number fAS#J, 
bne or street or road or highway name [ASN], 
suburb or town or city name (ACN], 
region or state or territory or province name (ARM], 
post code JAPC], 
country or nation name [ANN], 
telephone: 

type p-J] (= 'PREFERRED- &| "VOICE* &| -MOBILE* &| 'CAR- &| -MESSAGE' 
ft IMAGER- &[ -FACSIMILE* &| "MODEM* &| "ISDN* &| "VIDEO") \ 
nation or country code number (TC#], 
trunk access number *TT#]. 
area code number [TAtf], 
local number fTL#l, 

email: 

type [ETJ {= -INTERNET" | {other}), 
mailer (EM], 
address (EAJ, 
• Internet address: 
• urlflURL]. 
50 • date & time: 



25 



30 



35 



40 



45 
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• date & time from JDTFJ. 

• date & time to [OTTl. 

• weekday from [DTWF], 

• weekday to [DTWTJ. 

5 • weekday time from (DTWFT). 

• weekday time to [DTWTT]. 

• time zone [DTZJ. 

• brand name [BN]. 

• pubDckey: 

10 . keytypepCT]. 

• keypq. 
• geographical: 

• location units {GLUJ. 

• location (GLL 

15 • serviced region units [GLRUJ. 

• serviced region {GLRJ. 

Suggested attribute name [CN] is the name of an attribute associated with the attribute value 
contarn,ng classrfication name' type data. For example, the [CN} attribute value could be the 
T A 3 ^ PnelarY or national or international or other industry classification standard such 
as the Australian and New Zealand Standard Industry Classification or "ANZSIC" for short or 

cWhL,^" ^ e J? nSUS ,ndusWal Classifications (USBCIC). The associated 
^•cabon codes (CC) attribute value could contain the codes and/or descriptions of the 

25 e^™* rjZ-^tr^*! ° r wahoul "^^^ **■■» or extensions. For 
SZ h CC=T31;Road transport' or CN=\JSBCIC- CC='58liHardware store'. 

Service dassrficabons such as the International Standard Classification of Occupations could 
M^^^ amp,e : CN=,,SC ° 0 ' ^^^^ctioneer- Product classifications such as the 
SSS^rSZI^ f^ 0 " Coding System could be used. For example: 
™ J CC-8411;Turbojets, turbo^ropeOers & other gas turbines; parts thereof For 

30 subject educations. Dewey Decimal, and/or Universal Decimal and/or Library of Congress 
r^JSL 8nd/ ° r C0l ° n Ctessi,i «*"> could be used. For example:. CN='DDC 
CC- 577^99;Sea shore ecology* The inclusion of subject classifications provides a very 

riL S ^^ n ?' d mBth0d ° f da88iVn9 * B of an HTML document which 

could be attractive to commercially oriented copyright owners. 

35 

The text rTJEXTJ). person fJPNC] - [PRD. organisation <[0N1 - (OIDJ). physical or post or 

2S25f*"iK?," IANm tetePh ° ne ^ ' ernaO address i^- [EaT^ 

nIxff,Ti^ L1 af ** tended to assoc^ted with each other in the obvU manner. 
Date 4 bme{s) ffDTFJ - (DTZJ) are intended to indicate the times at which the address and/or 
!S£S ^ "l* * associa ted person(s) and/or organisation^). 

The brand name QBNJ) attribute is intended to hold commercial brand names. PubDc key (TKT] 

^ ^ f ° f —""ication wrth the contact 

GnJ 36 ^^^ E1 «- 5201 . S36 «93or -148.5201,36.6693). or a UnJS 
w2J ^!!SiS?frS " ^ fll ° bat nafi0na '' reflional - ^ reference 
T^Z ^f [ ? LUl ' "** * ^ h w b * ***"■ *» a <i9teDy encoded 

50 a?dr^r!L m0fB P ° PUlated regi0n8 * some ""n«nes such as the U.S.. street 

50 addresses and post codes are associated with a moderately accurate geographic location and 
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can be used to interpolate geographic location data where geographic location data is not 
expBcitly stated in the CCG-data. Using a universally recognised code such as latitude and 
longitude has advantages when used with international mediums like the Internet 
Geographical location is intended to be associated with a post delivery address or physical 
5 address such as place of business or residence. A CCG compliant browser could use this 
reference to display a map centred on that geographic location. The purpose of the 
geographical location data is to aQow browser users to specify search engine search criteria 
which wiD result in the search engine selecting only those Internet accessible documents which 
provide details about providers which are within a specified region. The serviced region [GLR] 
10 ts intended to indicate the preferred area of operation of providers expressed in terms of 
serviced region units [GLR14 A radial distance (eg in kOometres} or alternate means of 
expressmg an area of interest around a geographic point such as polygons, are envisaged. 

It is envisaged that the CCG attribute_value could be composed of more than one value 
1 5 (actually sub-value) wherein specific characters or character strings separate individual values. 

While specific instances of element names and types have been given in this example, of 
more importance is the type of data and type controls over the display and indexing of the 
data. As an alternative to the preferred immediately following example where the CCG-data is 
20 tumped together under the HTML element named *CCG\ certain elements of the data, for 
example the classification data, could be lumped under separate HTML elements with 
distinctly different names thereby separating CCG classification data from CCG contact data. 
However, this is not preferred because the strength of association between the two types of 
data is weakened. 

25 

Example 2: Classification of Portion of a Web Page. 

Where it is desired to classify a portion of a web page, sgch as a paragraph about a product 
simple CCG-data may be used in conjunction with the syntax of Examptel . For example: 
<A NAME="Radios">AM-FM radio receivers: </A> 
30 <CCG HREF=^Radios B > 

cn=-anzskt 

CC=*E23.34.78;EtecWcaJ equipment - radio receivers AM" 
CC=*E23.34.79;Electrical equipment - radio receivers FM" 
</CCG> 

35 We wont be beaten on the price of these high quality receivers .... 

In this example, the CCG prase appears after the related anchor (<A NAME=...</A>). 
However, while such proximity visually provides an obvious association between the anchor 
and related CCG phrase, it is intended that CCG phrase containing the attribute HREF related 
to a specific anchor could appear anywhere within the body of a web page and remain related 

40 to the named anchor. The CCG phrase containing the attribute HREF could appear en a 
separate document and thereby relate the CCG-data to the entire document or to a named 
anchor although, as previously noted, coordinating separata documents can be problematic. In 
the absence of the HREF and NAME attributes, rt is also intended that the CCG-data apply to 
the whole web page. 

45 

Example 3 Classification of Portion of a Web Page using XML Syntax 

Using XML syntax and similar attribute names to those of Example 2 the HTML fragment of 

Example 2 may be rewritten as: 

<A NAME=*Radios~>AM-FM radio receivers: </A> 
50 <XML> 
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<CCG> 

<HREF>mtdios-</HREF> 
<CN>7VNZSKT</CN> 

<CC>"E23.34.78:Electrical equipment - radio receivers AM"</CO 
5 <C0^23.34.79;Bectrical equipment - radio receivers FW</CO 

</CCG> 
</XML> 

We wont be beaten on the price of these high quality receivers .... 
This example demonstrates that the translation of CCf>data from HTML to XML (and the 
10 reverse) involves simple syntactical and grammatical translations. Of course, the resulting 
HTML and XML. while >efl formed' might not be recognised or, rf recognised, might not be 
understood by some parsers. 

Example 4: Constructing a Web Paoe Containing CCG-data 

15 As an example, a web page developer, Afice Jamieson. is preparing an advertisement for a 
local electrician John WBfiams. trading as Kelso Electrical, who wants to advertise on the web 
for business within 30 kilometres from his office located at 18 Raglan Street Keteo. New South 
Wales. Afice uses a graphical user interface web page authoring tool capable of creating and 
modifying web pages containing HTML {and XML) CCG phrases by accepting inputs from a 

20 user. The tool executes on a digital computer having input devices such as a Keyboard, 
mouse, light pen and touch pad, display devices such as a CRT, LED arrays, liquid crystal 
arrays and computer-readable media such as magnetic and optical disks, memory arrays, 
magnetic tape and the like. 

25 The authoring tool also embodies knowledge of the content and structure of CCG phrases 
such as the attribute names, vaOd ranges and sets of associated attribute values, the normal 
order of the attributes in the CCG phrase and interdependences between attribute values. The 
tool provides a window where web pages may be viewed in layout (browser) mode and 
another window where the HTML code may be viewed in editing mode. The tool also provides 

30 means of inserting, deleting, modifying and organising HTML elements, changing font size, 
face and colour and so forth. The tool provides means for the user to build CCG phrases by 
using input devices to select an edit control representing various types of CCG attributes from 
a list which the tool then inserts in the body of a web page together with, when not already 
present HTML code indicative of the start and end of a CCG phrase. The user then types in 

35 the value in the attribute. Similarly, the tool provides means of converting web page text to 
CCG attributes. Using input devices, the user selects the text to be converted to a CCG 
attribute then selects an edit control from a Bst the tool then inserts the HTML code necessary 
to encode the text as a CCG attrfoute. However, these semknanual methods of creating and 
modifying CCG phrases are irteffitient and error prone. The tool also provides a button, which 

40 can be activated by using input devices, for access to CCG phrase editing functions. The CCG 
editing functions consist of a means of extracting the CCG values from existing CCG phrases 
in the web page being edited, forms for entering and modifying the extracted CCG values, a 
layout view browser window for altering how the CCG-data displays (position, font size, face, 
colour, bold, normal, hiding or showing and so forth), a data view browser window to alter 

45 which CCG-data values are to be indexed or not indexed in search engine databases, and a 
means of deleting existing CCG phrases from web pages and inserting new or changed CCG 
phrases in web pages. Editing cursors marking the current location at which text and/or data 
may be inserted, deleted or modified are provided in each window and form. 
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In the current example, the web page initially contains no CCG phrase. Clicking the CCG 
editing function button of the authoring tool causes a form to appear. The form contains 
prompts related to CCG attribute names and associated data input fields related to the CCG 
attribute values associated with the CCG attribute names, that is CCG-data. The fields are 
5 blank because, in the web page layout view, the edit cursor is not over a CCG phrase (and can 
not be since the web page initially contains no CCG phrase). The service classifications 
relevant to the web age, John WBCams physical business contact address, phone and fax 
numbers, email address and geographic location and his post office business contact 
addresses are entered into the forms using a keyboard and mouse. The developer, Alice 
10 Jamieson. also includes her basic contact details where provided for on the form. The forms 
use drop down lists to select address blocks (eg physical and post office) for editing. Logic 
associated with the forms validates the CCG attribute values and interdependences. Input 
devices are then used to control the CCG-data layout view browser to modify the appearance 
of the CCG-data such as font size and colour and positioning. In the layout browser, input 
15 devices communicating with the edit cursor are used to highBght individual items and blocks of 
items to be changed. The post office address is highlighted as a block and moved into position 
in fine with the physical address. The CCG-data view window is then used to check which data 
items are to be indexed by search engines. In this example aS CCG-data (\e an CCG attribute 
values except display control values and database control values) are to be indexed, input 
20 devices are used to control the edit cursor to highlight the entire data and a mouse is used to 
dick (activate) a button to mark all the data for indexing. Then another button is clicked which 
builds an HML encoded CCG phrase of CCG attributes derived from the CCG-data values, 
display control values and database control values and inserts the CCG phrase in the web 
page at the location pointed to in the web page layout browser window. 



The HTML code editing mode window was called up which revealed the following HTML 
encoded CCG. phrase in the web page: 



25 



<XML> 
<CCG> 



30 



<INDE#> 
<HiDEf> 

<CN>ANZStC</CN> 

<CC>D36.1 1 .4S:Electrical contractors - residential</CC> 
<CO036.1 1.46;Electrical contractors - industrial</CC> 
<SHOW/> 

<CONTACT/> <COPYRIGHT/> 

<BUSINESSA> 

<XPOS>50c/XPOS> 

<YPOS>320</YPOS> 

■cALIGN>centre</ALIGN> 

<SEE>3</SIZE>- 

<COLOR>black</COLOR> 

<FACE>Times New Roman </FACE> 

<BOLD/> 

<CLEAR>aO</CLEAR> 

<TEXT>Contact :</TEXT> 

<PNOMrc/PNC> 

<PNG>John<7PNG> 

<PHF>wmms</PNF> 

<PQ>AIE</PQ> 



35 



40 



45 



50 
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<PA>ARUC</PA> 
<NEWUNE/> 

<PT>Managmg Dinector</PT> 
<NEWLINE/> 

<ON>Kelso Electrical Pty. Ud.</ON> 
<NEWLINE/> 

<NORMAU> <ITAUa> v 
<SIZE>-2</SlZE> 

<TEXT>NSW License 45678C</TEXT> 

<NEWLINE/> 

<NORMAU> <BOLD/> 

<SIZE>*2</SiZE> 

<AT>PHYS1CAL</AT> 

<AS#>18<AS#> 

<ASN>Raglan Street<ASN> 

<NEWUNE/> 

<ACN>Kelso</CAN> 

<NEWLINE/> 

<ARN>NSW<ARN> 

<NEWLINE/> 

<HIDE/> 

<ANN>Austrafia</ANN> 

<NEWL!NE/> 

<SHOWA> 

<TEXT>Phone:<fTEXT> 

<TT>PREFERRED ; VOICE ; MESSAGE</TT> 

<HIDE/> 

<TC«>61</TC> 

<SHOW/> 

<TT#>0</TT#> 

<TAtf>63</TA#> 

<TL#>456-7828</TL#> 

<TEXT> Fax_</TEXT> 

<TT>FACSIMILE<nT> 

<HIDE/> 

<Tc#>6i<nrc#> 

<SHOW/> 

<TT#>cx/rr#> 

<TA#>63</TA#> 

<TL#>456-7829</TL#> 

<NEWLINE/> 

<ET>INTERNET</ET> 

<EA>johnw@firefly.com.au<EA> 

<TEXT> </TEXT> 

<GLU>LatLong</GLU> 

<GL>-"33.3978S:148.5679E</GL> 
<GLRU>Km</GLRU> 
<GLR>30 </GLR> 
<SETSEPARATOR/> 
<XPOS>250</XPOS> 
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<YPOS>320</YPOS> 

<NEWUNE/> 

<NEWLINE/> 

<TEXT>Or write to us at :</TEXT> 
5 <NEWLINE/> 

<ON>Keiso Electrical Pty. Ltd.<AQN> 

<NEWL!NE/> 

<AT>POST-OFFlCE</AT> 

<AP#>P.O. Box 187</AP#> 
10 <NEWLINE/> 

<APN>SunnyComer</APN> 

<TEXT> </T£XT> 

<AP02795</APO 

<NEWLINE/> 
15 <HIDE/> 

<ANN>Austra6a</ANN> 

<SET_SEPARATOW> 

<HlO&> 

<DEVELOP£R/* 
20 <BUSINESS/> 

<PNG>AIice<^NG> 

<PNF>Jamieson</PNF> 

<ET>INTERNET </ET> 

<EA>alOam@firefly.com.au</EA> 
25 <IURL>http:/torww/fre^ 
</CCG> 
</XML> 

In the web page layout browser window the CCG-data displayed as follows: 
30 Contact ; Or write to us at 

Mr John Williams. AIE. ARUC. 
Managing Director 

Kelso Electrical Pty. Ltd. Kelso Electrical Pty Ltd 

NSW License 45678C p.o. Box 187 

35 1 8 Raglan Street Sunny Comer 2795 

Kelso 
NSW 

Phone: 063^56-7828 Fax 063-456-7829 
Emafl: johnw@fimfry.com.au Map 

40 

Having encoded the web page in this way, Alice then posts it on the storage device of a digital 
computer connected to the Internet from where it can be retrieved through the Internet using 
the URL ^ttpi/Avww.firefyco^ 

45 Example 4: Constructing a Database from Web Paoes Containing CCG-data 

During a routine sweep of Internet connected web page servers, a web crawler (or robot) 
operating on a server named •ccg^earch.com- executing on an Internet connected digital 
computer discovers the URL ^:/Aw^.frefly.com,a^^ in a document it 

had previously retrieved through the Internet The web crawler decides that the URL matches 

50 it's selection criteria because the URL contains the suffix "htmr. The web crawler then 
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successful* retrieves the document by extracting from the URL the address of the computer 
hosting the document addressing and sending a message (including the address of the web 
crawler) requesting the web page through the network to the web page host computer using 
TCP/IP protocol, the host computer then reads the document addresses and sends the 
5 document to the web crawler using TCP/IP protocol, the web crawler then waiting until it has 
received all parts of the web page from the host computer before proceeding. It inspects the 
contents of the document and finds that a matches the additional selection criteria that it is an 
HTML encoded document The web crawler program, depending on its state and logic then 
parses the document strips out and saves some or all of the URLs in the document for future 
10 examination. The web crawler program then passes the document together with the URL of 
the document through a network coinmuntcations channel to an indexing program executing 
on a different computer. The indexing computer has database updating software which 
manipulates a database stored on computer-readable media. 

15 The indexing program parses the document from first to last character, indexing some of the 
meta data in the <head> of the document and the words m the text of the document with 
respect to the document URL In the database of this example, unique words extracted from 
the documents already indexed are held in separate rows of a column of a database table and 
m another column of the same table on each row is an associated pointer to the first bucket or 

20 block of URLs of documents containing the word associated with the pointer. As new words 
are found, the new word is added as a new row in the word column of the table, a new bucket 
is created, the URL of the document containing the new word is inserted into the bucket and a 
pointer to the new bucket is written in the new row pointer column. When the same word a 
found in another document the row in the table of the wort is found, the pointer is retrieved 

25 from the table, the bucket pointed to by the pointer is retrieved and the URL of the other 
document is inserted in the bucket Where a bucket beeomes full of URLs, a new bucket is 
created and a pointer to the new bucket for holding additional URLs is placed in the full bucket 
Deletion of words and URLs of changed or no longer existing documents is also provided for. 

30 In addition toindexing words extracted from the text of the document the indexing program 
also indexes the CCG-4ata h the document as weO as indexing words found in the CCG-data 
When the parser finds HTM. element °<XML>° in the document it switches into XML parsing 
mode and switches out of that mode tahan *</WWL> is found. When the element -<CCG>' is 

°?^ m f- P f rcer switches hto 000 P ars "g nude and switches out of that mode when 
35 </CCG> is found. 

The example database has a CCG-data attribute name to database property name 
correspondence table to show the relationship between the CCG-data attribute names and the 
database tables and columns (properties) where the CCG-data attribute values are to be 
40 stored in the database as database property values. The database property values and 
assocated URLs are stored m much the same way as for words extracted from text as 
ouffined abov* However. CCG contact data, for example, which consists of several distinct 
CCG-data attributes which are related (eg street name. city), is stored m a database table 
havmg a «lumn (property) related to each distinct CCG contact attribute name and each 

S P .!r?^ oS**^® 9 P** 00 '* name ' address - tete P"<> n e "^ber) « separated 
\ ? J* ^ ET - SEPARAT0R> ° and ««CCG>- is hew in a separate row in the table. The 
values stored tn each row are considered to be a set of associated property values of different 

types. 
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The indexing program, during parsing the document of Example 2 above, encounters the 
"<CCG>" element and enters the CCG parsing mode. The parser knows to ignore display 
control attributes and to consider database control elements in the CCG phrase. The example 
indexing program opts to index all other CCG-data contained in the attribute values until 
5 explicitly instructed not to index the attribute values by encountering the °<NOINDEX/>° 
database control element and then to recommence indexing when the "<lNDEX/>° database 
control element is encountered. 

Taking each CCG-data attribute name and associated attribute value(s) in succession, the 

10 example indexing program uses the correspondence table to translate the CCG-data attribute 
name to the database table and column (property) names where the CCG-data attribute 
va!ue(s) are to be stored as database property value(s). The indexing program may opt to 
translate the CCG-data attribute values to database property values by. for. example, 
converting character strings of digits to binary encoded decimal representation, the string 

15 True* to a single bit representation end the Eke. The indexing program then adds or updates 
the database property value(s) t using the database table and column (property) names (or 
similar references) obtained by translation, in much the same manner as outlined above for the 
update of the database using words Extracted from the document text, including associating 
the data to the document URL where desired. Where the CCG-data contains a °HREP 

20 attribute (or similar), the URL associated with the other CCG-data is a URL taken from the 
•HREF attribute value or composed of the document URL and the 'HREF attnDute value if 
the attribute value is a partial or relative URL Some CCG attributes, such as "<BUSlNESS/> 
have only an implied value of true the attribute is present and false rf the attrtoute is absent 
the °<SET_SEPARATOR/> B , a <CCG>° and °</CCG>* resetting such values to false. However. 

25 where attribute vafue(s) associated with different attribute names are still related, such as a 
person's name and a street name, the related values of different types are stored on the same 
row of the same database tabte but fn a different column (database property) to preserve the 
relationship. B <SET_SEPARATOR/>° imfis the degree of relatednees between, for example, a 
person's name occurring before the separator and a street name occurring after the separator. 

30 Using the example document and using the same database column (property) names as used 
for the CCG-data attribute names a portion of the table constructed database table would look 
like: 





PNC 


PNG 


PWF 


PQ 


PA 


PT 




URL 






















Mr 


John 


wansns 


AJE 


ARUC 


ft&nagjrtg Director 




(pointer) 





















35 Difficulties not highlighted by this example are the need to handte properties having multiple 
values of the same type, "sparse rows 0 where only a few values are not null (blank) and tables 
with extremely large numbers of rows. For example, the CCG-data of this example could have 
contained multiple values of personal quafiffcafions f PQ°). To represent this type of data using 
a 2 dimensional tabte database system, the database would be "normalised 0 so that the 

40 multiple values were stored en a separate tabte and keys or pointers were used to relate the 
relate the items in the two tables. Numerous aftemate database systems, for example those 
based on key hashing and data buc£ts&, or tagging data values with prefixes or suffixes 
related to the type of data value may be used. Preferably, however, whatever database 
system is uesd, it should praserv® tho associations of CCG-data items present in the CCG 

45 phrases. 
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Because the geographic location data was missing from the postal address of the CCG-data in 
the example document, but a post code was present the indexing program inferred the 
geographic location from the post code. 

5 

Example 6: Finding Web Pace References Using a CCG Database 

As an example, Kevin Robson fives in Sydney but owns and has rented out a house in 
Bathurst He wants to use the web to find some electricians based in the general Bathurst 
region (not only in Bathurst City) to contact for estimating the cost of modifying the wiring in the 
10 house. He uses hts web browser to open the web page 
*http:/A*nw.ausDne.com.auA¥ebjsearchJUmr containing AusUne's search engine web page 
search criteria input form encoded using the HTML *<fomi> fc element 

The search criteria input form contains several input fields including those labelled •Sen/ice 
15 classification*, "Key words', "City./Suburbnown\ "Country". T-at/Long - and 'Radius'. The form 
also displays a button labelled "Map" to allow latitude and longitude to be selected by pointing 
to map images. The word 'electrician" is typed into the •Service classification" field, 'house 
wiring" into the 'Keywords" field. "BathursT into the Xity/Suburb/Town* field and "10" into the 
field 'Radius". The country "Australia* was already showing in the country field because the 
20 web page server had received cookie data from the browser indicating that that was the 
country used when the browser last used the web page. The 'submit search' button on the 
web page was clicked. The browser transmitted a message using TCP/IP protocol to the 
AusLine server containing the Input field values encoded in the header of the message. 

25 After a short delay, the search result HTML encoded web page was returned. Clicking on the 
•Service classification" input field drop down list box to check the classifications used in the 
search revealed three items: 

• Electrical contractors - residential 

• Electrical contractors - industrial 
30 • Electrical engineers 

The search engine attached to the server obtained those classifications by using word 
stemming and searching the text of the service classifications held in its database. The 
Lat/Long field contained the value "33.3856S;148.5743E' which the search engine obtained 
by looking up the latitude and longitude of the town "Bathurst" in the country "Australia" in it's 
35 database. Clicking on the "Map' button retrieved a web page having the image of a map 
centred on the town of Bathurst and showing the area 20 Km around 1 The search engine 
obtained the map by making a request to another Internet connected server and supplying the 
latitude, longitude and radius. GficWng on the browser "Back* button returned to the search 
results page. 

40 

The search results contained 8 titles, brief descriptions and URLs including a reference 
containing the URL ^tfc:/nvww.firafly.c^ Retrieving each in turn 

revealed that all were weD focused according to the search criteria being related to electricians, 
electrical contractors and engineers in the Bathurst area. The search engine obtained these 
45 references to web pages by: 

• searching it's database of service classification titles with worts stemming from 
"electrician" which resulted in three sen/ice classification codes. 

• searching it's database using the three service classification codes to obtain an 
intermediate fet of URLs of web pages containing those CCG codes 
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• searching if s database for the two keywords to obtain an intermediate list of URLs of 
web pages containing those words in the web page text. 

• Searching it's database to find the latitude and longitude of BathursL Australia. 

• searching if s database to obtain an intermediate fist of web pages which contain 
5 latitude and longitude data lying within 10 Km of the latitude and longitude of 

Bathurst, Australia. 

• producing as a result fist, a list of URLs which are common to all the intermediate lists. 

• obtaining from If s database the title and brief description of the web pages, 

• formatting the titles, descriptions and URLs into an HTML encoded report. 
10 • transmitting the report to the enquiring web browser. 

Example 7: Finding Contact Details Using a CCG Database 

As an example. Jim Jones of Jones and Sons wants to send a recall notice about a faulty 
batch of UV stabilised electrical power cable to a8 Electrical contractors and Electrical 
1 5 wholesalers in Australia who have email addresses. He uses his web browser to open the web 
page ^ttp7/www.ausnne.com.aufcomactj5earch.htmr containing AusLine's search engine 
contact search criteria input form encoded using the HTML *<form>* element 

The search criteria input form contains several input fields including those labelled •Service 
20 classification*, 'Country" and "Output format*. The word •electric- is typed into the •Service 
classification" field, the word "Austrafa - is typed into the 'Country* field and the Tabular - 
Name & Email" option in the "Output format" drop down list box is selected. The "Submit 
search* button on the web page is dicked. The browser transmits a message using TCP/IP 
protocol to the AusLine server containing the input field values encoded in the header of the 
25 message. 

After a short delay, the search result HTML encoded web page is returned. Clicking on the 
"Service classification" input field drop down list box to check the classifications used in the 
search revealed tod many classifications for the result to be sufficiently focused. The following 
30 four classifications were selected from the fist 

• Electric cable - ducting systems 

• Electrical contractors - residential 

• Electrical contractors - industrial 

• Electrical wholesalers 

35 and the 'Submit search* button is pressed again to refine the search. 

The search results contained 3.473 names and associated email addresses and URLs to full 
contact details. Jim saved the search result page on his computer so that he coukJ use his 
email program to send the recafl notice to each emaB address m the list. The email address 
40 "johnw@firefly.com.au" was included in the fist 

The search engine obtained these references to web pages by: 

• searching it's database using the four service classification titles which resulted in four 
service classification codes, 

45 • searching ftrs database using the four service classification codes to obtain an 
intermediate list of database primary keys of database table rows containing those 
service classification codes in the database Service classification attribute, 

• searching rTe database using the countiy name "AustraBa* to obtain an intermediate 
list of database primary keys of database table rows containing that word in the 

50 database Country attribute. 
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• producing as a result fet a fat of database primary keys which are common to both 
the intermediate fists. 

• obtaining from it's database using the result fist the values of the name and email 
attributes, 

5 • using the HTML <tab!e> element to format the name values, email values and full 
detail URLs into an HTML encoded report. 

• transmitting the report to the enquiring web browser. 

This example relates to finding sets of associated database contact values without requiring 
10 references to web pages. However, finding other sets of associated database values such as 
sets of associated industry classification values and geographic location values might also be 
useful for some purposes. 

Thus it is appreciated that the afore stated goals, advantages and objectives are achieved by 
15 the teachings herein. In particular it is seen that, unfike the prior art efficiently searchable 
Yellow pages and White pages databases and the fike may be automaticaffy constructed from 
HTML encoded web pages. Additionally the database entries may be automatically finked to 
specific web pages and portions of web pages allowing convenient methods of indexing of 
product and service catalogues and the fike. It is also appreciated that simpler methods of 
20 constructing databases suited to a variety of other uses such as industry and subject 
directories are also provided. 

From the foregoing teachings and with the knowledge of those skffied in the art. it is apparent 
that other modifications and adaptations of the invention wrH become apparenL For example. 
25 the method steps disclosed and claimed herein may be practiced in a variety of different 
orders. CCG-data may take on a variety of different forms within the meaning of the daims. 
Thus, ft is our Intention to include within the scope of the daims not only the invention literally 
embraced by the language of the claims but to include aH such modifications and adaptations 
which may come to those skilled in the art 
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What I daim is: 



V An HTML encoded web page embodied on a computer-readable medium, said web 
page comprising at least one HTML encoded CCG phrase, each CCG phrase 
5 comprising: 

a) HTML code indicative of the start of a CCG phrase, 

b) at least one CCG-data attrfoute, and 

c) HTML code indicative of the end of a CCG phrase. 

10 2. An HTML encoded web page embodied on a computer-readable medium, said web 
page comprising at least one HTML encoded CCG phrase, each CCG phrase 
comprising: 

a) HTML code indicative of the start of a CCG phrase. 

b) at least two CCG-data attrfoutes. 

15 c) at least one database control attribute separating said CCG-data attributes into at 
least two sets of CCG attributes, and 

d) HTML code indicative of the end of a CCG phrase. 

3. An HTML encoded web page embodied on a computer-readable medium, said web 
20 page comprising at least one HTML encoded CCG phrase, each CCG phrase 

comprising: 

a) HTML code indicative of the start of a CCG phrase, 

b) at least one CCG-data attrfcutes. 

c) at least one attrkute of: database control attributes, display control attributes; and 
25 d) HTML code indicative of the end of a CCG phrase. 

4. A computer implemented method of building a web page comprising at least one HTML 
encoded CCG phrase, the method comprising the steps of: 

a) displaying a web page on a computer display device. 
30 b) displaying an edit cursor Indicating a character position on said display device and 
a corresponding character position in said web page, said edit cursor being 
posittonable within the display of said web page by use of computer input devices, 
c) separately displaying on said computer display device a set of edit controls 
representing CCG-data attribute types. 
35 d) positioning said edit cursor within said display of said web page using said input 
devices, 

e) selecting an edit control from said set of edit controls using said input devices, 

f) relating said selected edit control to a corresponding CCG-data attribute name. 

g) constructing a CCG-data attribute character string comprising a character string 
40 representing said attribute name and another character string representing an 

empty CCG-data value, 

h) if the sad edit cursor is positioned outside a CCG phrase. 

i) inserting into said web page, at the character position indicated by said edit 
cursor, a start character string comprising HTML code indicative of the start 

45 of a CCG phrase. 

ii) inserting into said web page, immediately after the end of said start 
charader string, an end character string comprising HTML code indicative of 
the end of a CCG phrase, and 

in) positioning said edit cursor between said start and end character strings, 
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i) inserting said CCG-data attribute character string into said web page at the 

character position indicated by said edit cursor, 
j) positioning said edit cursor at the character position in said web page of the CCG- 

data value of said hserted CCG-data attribute character string, 
k) inputting characters using a keyboard. 

0 inserting said input characters into said web page at the character position 
indicated by said edit cursor, thereby converting said empty CCG-data vafue to a 
non-empty CCG-data value, and 

m) writing said web page on computer-readable media. 

A computer implemented method of building a web page comprising at least one HTML 
encoded CCG phrase, the method comprising the steps of: 

a) displaying a web page on a computer display device. 

b) displaying a start edit cursor and an end edit cursor on said display device, each 
said edit cursors indicating a character position on said display device and a 
corresponding character position in said web page, said edit cursors being 
positionable within the display of said web page by use of computer input devices. 

c) separately displaying on said computer display device a set of edit controls 
representing CCG-data attribute types, 

d) selecting a string of web page characters on said display device using said input 
devices to position said start edit cursor to indicate the start said string of web 
page characters and said end edit cursor to indicate the end of said string of web 
page characters. 

e) selecting an edit control from said set of edit controls using said input devices, 

f) relating said selected CCG-data control to a corresponding CCG-data attribute 
name, 

9) constructing a CCG-data attrfcuta character string comprising a character string 
representing said attrfoute name and another character string representing a CCG- 
data value containing said string of web page characters, 
h) deleting said string of web page characters from said wen page. 
0 if the said start edit cursor b positioned outside a CCG phrase, 

i) inserting into said web page, at the character position indicated by said start 
edit cursor, a start character string comprising HTML code indicative of the 
start of a CCG phrase, 
u) inserting into said web page, immediately after the end of said start 
character string, an end character string comprising HTML code indicative of 
the end of a CCG phrase, and 
iii) positioning said start edit cursor between said start and end character 
strings, 

j) inserting said CCG-data attrfcute character string into said web page at the 
character position indicated by said start edit cursor, thereby converting said string 
of web page characters to a CCG-data attribute value contained within a CCG- 
data attribute contained within CCG-phrase. and 

k) writing said web page on computer-readable media. 

A computer implemented method of building a web page comprising at least one HTML 
encoded CCG phrase, the method comprising the steps of; 

a) displaying a CCG-data Input form on a computer display device. 

b) inputting CCG-data values into fields of said data input form using computer input 
devices. 
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c) inserting into the body of a web page a start character string comprising HTML 
code indicative of the start of a CCG phrase. 

d) inserting into said web page body immediately after the end of said start character 
string an end character string comprising HTML code indicative of the end of a 

5 CCG phrase. 

e) extracting successive field values from said data entry form together with related 
field value type information. 

f) relating the type of each extracted field value to a corresponding CCG-data 
attribute name, 

10 g) constructing a CCG-data attribute character string comprising a character stnng 
representing said attrfcute name and another character string representing said 
field value, 

h) inserting said CCG-data attribute character string into said web page between said 
start and end character strings. 
15 i) writing said web page on computer-readable media. 

7 A computer implemented method of buflding a database which comprises sets of 
associated property values wherein each set includes at least two property values of 
different types, the property values being any of classification values, contact values, 
20 geographic location values, hereinafter collectively referred to as CCG-data. the method 
comprising the steps of: 

a) retrieving successive web pages from a computer network, each web page being 
identified by a URL 

b) searching each web page for a CCG phrase that includes a plurality of different 
25 types of CCG-data attributes, 

c) extracting a plurality of said attributes from said phrase, 

d) from each extracted attrfeute. deriving an attribute name and a related attribute 

value * 

e) determining the type of said extracted attribute and said attribute value by 

30 reference to said attribute name, 

0 relating said type of attribute value so determined to a corresponding type of 
database property value, 

g) relating the URL of said web page to an other type of database property value, 

h) writing said derived attribute value to the database property value of said 
35 determined corresponding type in a set of associated property values, and 

i) writing the URL of said web page to a database property value of said other type 
in said set of associated property values. 

B A computer implemented method of building a database which comprises sets of 
40 associated property values wherein each set includes at least two property values of 
different types, the property values being any of classification values, contact values, 
geographic location values, hereinafter collectively referred to as CCG-data. the method 
comprising the steps of. 

a) retrieving successive web pages from a computer network, each web page being 
45 identified by a URL, 

b) searching each web page for a CCG phrase that includes at least one type of 
CCG-data attribute. 

c) extracting at least one said attribute from said phrase. 

d) from each extracted attribute, deriving an attribute name and a related attribute 
50 value. 
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e) determining the type of said extracted attribute and said attribute value by 

reference to said attribute name, 
0 relating said type of attribute value so determined to a corresponding type of 

database property value, 

g) relating the URL of said web page to an other type of database property value, 

h) writing said derived attribute value to the database property value of said 
determined corresponding type in a set of associated property values, and 

i) writing the URL of said web page to a database property value of said other type 
in said set of associated property values. 

A computer implemented method of building a database which comprises sets of 
associated property values wherein each set includes at least two property values of 
different types, the property values being any of classification values, contact values, 
geographic location values, hereinafter collectively referred to as CCG-data, the method 
comprising the steps of. 

a) retrieving successive web pages from a computer network, 

b) searching each web page for a CCG phrase that includes a plurafity of different 
types of CCG-data attributes. 

c) extracting a plurafity of said attributes from said phrase, 

d) from each extracted attribute, deriving an attribute name and a related attribute 
value, 

e) determining the type of said extracted attribute and said attribute value by 
reference to said attribute name, 

f) relating said type of attribute value 60 determined to a corresponding type of 
database property value, and 

g) writing said derived attribute value to the database property value of said 
determined corresponding type in a set of associated property values. 

A computer implemented method of finding references to web pages posted on 
computer network the method using a database comprising sets of associated property 
values, the property values being any of classification values, contact values, geographic 
location values, hereinafter oofecfively referred to as CCG-data, and URL references, 
the method comprising the steps of: 

a) receiving a query phrase including query relational expressions from a computer 
network. 

b) parsing said query phrase and extracting each of said query relational expressions 
included therein, 

c) from each extracted query relational expression, deriving a query field ftame. 

d) determining the type of said query relational expression by reference to its derived 
query field name, 

e) relating said type of query relational expression so determined to one of the 
following query relational expression types: CCG-data type, other type. 

0 provided said query relational expression is a CCG-data type, deriving a query 
relational operator and query value related to its query field name from said query 
relational expression, 

g) determining the type of said query value by reference to said query field name, 

h) relating said type of query value so determined to a corresponding type of 
database property value, 
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i) beating database property values of said determined corresponding type which 
return a true value when tested against said query value using said query 
relational operator, 

j) extracting from said database a fist of the URL references associated with the so 
located database property values. 

A computer implemented method of finding sets of associated database property values 
the method using a database comprising sets of associated property values wherein 
each set includes at least two property values of different types, the property values 
being any of classification values, contact values, geographic values, hereinafter 
collectively referred to as CCG-data, the method comprising the steps of. 

a) receiving a query phrase including query relational expressions from a computer 
network. 

b) parsing said query phrase and extracting each of said query relational expressions 
included therein, 

c) from each extracted query relational expression, deriving a query field name, 

d) determining the type of said query relational expression by reference to its derived 
query field name. 

e) relating said type of query relational expression so determined to one of the 
following query relational expression types: CCG-data type, other type, 

0 provided said query relational expression is a CCG-data type, deriving a query 
relational operator and query value related to its query field name from said query 
relational expression, 

g) determining the type of said query value by reference to said query field name, 

h) relating said type of query value so determined to a corresponding type of 
database property value. 

i) locating database property values of said determined corresponding type which 
return a true value when tested against said query value using said query 
relational operator, 

j) extracting from said database sets of associated database property values 
associated with the so located database property values. 

A method of displaying a web page comprising at least one HTML encoded CCG 
phrase, the method comprising the steps of. 

a) retrieving a web page from a computer network. 

b) parsing said retrieved web page to locate an HTML code indicative of the start of a 
CCG phrase, 

c) parsing said located CCG phrase and extracting successive CCG attributes 
contained therein untt an HTML code indicative of the end of said CCG phrase is 
found. 

d) from each extracted attribute, deriving an attribute name, 

e) determining the type of said extracted attribute by reference to its derived attribute 
name, 

f) relating said type of attrfeute so determined to one of the following attribute types: 
database control, display control, CCG-data. 

g) provided said extracted attribute is not a database control type, deriving an 
attribute value related to its attrfeute name from said extracted attribute, 

h) determining the type of said attribute value by reference to said attribute name. 

i) relating said type of attribute value so determined to a corresponding type of 
parameter of a display-device-control-program. 
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j) writing said attrfrute value to said parameter, and 

k) where said type of attribute is a CCG-data type, causing said display-device- 
control-program to effect display of said .attribute value on a display device, 
formatted and positioned according said display-device-control-program 
parameters whereby successive values of CCG-data of the CCG phrase arc 
displayed. 



frCOOClD: <AU 53031 9SA__I_> 



ABSTRACT 

A system for automatically creating databases containing industry, service, product and 
subject classification data, contact data, geographic location data (CCG-data) and Dnks to web 
pages from HTML. XML or SGML encoded web pages posted on computer networks such as 
5 the Internet or Intranets. The web pages containing HTML. XML or SGML encoded CCG-data. 
database update controls and web browser display controls are created and modified by using 
simple text editors. HTML. XML or SGML editors or purpose built editors. The CCG databases 
may be searched for references (URLs) to web pages by use of enquiries which reference one 
or more of the items of the CCG-data. Alternatively, enquiries referencing the CCG-data in the 
10 databases may supply contact data without web page references. Data duplication and 
coordination is reduced by including in the web page CCG-data display controls which are 
used by web browsers to format for display the same data that is used to automatically update 
the databases. 



